Consistent Histograms In The Presence of Distinct Value Counts
نویسندگان
چکیده
Self-tuning histograms have been proposed in the past as an attempt to leverage feedback from query execution. However, the focus thus far has been on histograms that only store cardinalities. In this paper, we study consistent histogram construction from query feedback that also takes distinct value counts into account. We first show how the entropy maximization (EM) principle can be leveraged to identify a distribution that approximates the data given the execution feedback making the least additional assumptions. This EM model that takes both distinct value counts and cardinalities into account. However, we find that it is computationally prohibitively expensive. We thus consider an alternative formulation for consistency – for a given query workload, the goal is to minimize the L2 distance between the true and estimated cardinalities. This approach also handles both cardinalities and distinct values counts. We propose an efficient one-pass algorithm with several theoretical properties modeling this formulation. Our experiments show that this approach produces similar improvements in accuracy as the EM based approach while being computationally significantly more efficient.
منابع مشابه
Platelet counts and its course for predicting in-hospital mortality in intensive care unit
Background & Aims: Recent studies have shown that thrombocytopenia (TP) is associated with poor outcomes in patients with pneumonia, burns, and H1N1 influenza. The aim of this study is to determine the impact of platelet count trends and TP on mortality in intensive care unit (ICU) patients. Materials & Methods: TP was defined as <150,000 platelets/ml. In this study, 300 patients who had been ...
متن کاملTHE SPINLESS SALPETER EQUATION AND MESON DYNAMICS
Applying the variational method, the spinless reduced Bethe-Salpeter (RBS) equation is solved for the mesonic systems, and the mass spectra are obtained. The method is applied to the Hamiltonian with the Gaussian and hydrogen-type trial wave functions, and different potential models are examined. The results for the different potentials are in challenge in light mesons, while they are consisten...
متن کاملProcalcitonin as a Marker of Neonatal Sepsis in Intensive Care Units
Background: The appropriateness of using serum levels of procalcotonin (PCT) for early diagnosis of newborn sepsis is still controversial. Therefore, the objective of the present study was to compare the usefulness of PCT with those of serum levels of C-reactive protein (CRP) and white blood cell (WBC) counts in the diagnosis and response to treatment of neonatal sepsis. Methods: A total of 47 ...
متن کاملMorphological Variability of the Aspius aspius taeniatus (Eichwald, 1831) in the Southern Caspian Sea Basin
Traditional morphometric measurements and meristic counts were used to investigate the hypothesis of population fragmentation of Mash mahi, Aspius aspius taeniatus (Eichwald, 1831) among two fishing areas in southern Caspian Sea basin(Tonekabon:32 specimens and Sari:34 specimens ). Univariate analysis of variance showed significant differences between the means of the two groups for 12 out of...
متن کاملDiagnosis Of Urinary Tract Infection Using Standard Urinalysis Or Hemocytometer Leukocyte Count
Background and Objective: This study was undertaken to assess the ability of standard urinalysis (UA) and hemocytometer white blood cell (WBC) counts for the diagnosis of urinary tract infection (UTI) in patients with urinary symptoms. Materials and Methods: A total of 600 patients with symptoms of urina...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- PVLDB
دوره 2 شماره
صفحات -
تاریخ انتشار 2009